Skip to content

ENH: image and transform baselines and experiment notebook testing#26

Merged
aylward merged 2 commits intomainfrom
testing_img_tfm
Feb 12, 2026
Merged

ENH: image and transform baselines and experiment notebook testing#26
aylward merged 2 commits intomainfrom
testing_img_tfm

Conversation

@aylward
Copy link
Copy Markdown
Collaborator

@aylward aylward commented Feb 9, 2026

  • Add TestTools (test_tools.py): compare 2D/3D image slices and ITK transforms to baselines with configurable tolerances; ITK .mha I/O
  • Add notebook_utils.running_as_test() for reduced params when run via pytest
  • Add --run-experiments flag, experiment marker, and tests/baselines
  • Use TestTools in test_register_time_series_images for image and transform comparison
  • Update experiment notebooks and test docs (EXPERIMENT_FLAG_USAGE, EXPERIMENT_TESTS_GUIDE)

…t support

- Add TestTools (test_tools.py): compare 2D/3D image slices and ITK transforms
  to baselines with configurable tolerances; ITK .mha I/O
- Add notebook_utils.running_as_test() for reduced params when run via pytest
- Add --run-experiments flag, experiment marker, and tests/baselines
- Use TestTools in test_register_time_series_images for image and transform comparison
- Update experiment notebooks and test docs (EXPERIMENT_FLAG_USAGE, EXPERIMENT_TESTS_GUIDE)
Copilot AI review requested due to automatic review settings February 9, 2026 22:28
@aylward aylward changed the title ENH: image and transform baseline testing and experiment notebook tes… ENH: image and transform baselines and experiment notebook testing Feb 9, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds baseline-based regression testing utilities for images/transforms and introduces a “running as test” mechanism for experiment notebooks (via PHYSIOMOTION_RUNNING_AS_TEST) so notebooks can run with reduced parameters under pytest.

Changes:

  • Added TestTools utilities to write/compare ITK images/transforms against baselines with tolerances.
  • Added notebook_utils.running_as_test() and updated experiment test runner to set PHYSIOMOTION_RUNNING_AS_TEST=1 when executing notebooks.
  • Updated time-series registration tests and multiple experiment notebooks/docs to use the new testing flow.

Reviewed changes

Copilot reviewed 39 out of 40 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
tests/test_register_time_series_images.py Switches from “write artifacts” to baseline comparison workflow for key outputs.
tests/test_experiments.py Passes PHYSIOMOTION_RUNNING_AS_TEST=1 into notebook execution subprocess env.
tests/conftest.py Adds tests/baselines directory to the test_directories fixture; refactors a data-path fixture.
tests/baselines/.gitkeep Ensures baselines directory is present in the repo.
tests/EXPERIMENT_TESTS_GUIDE.md Documents PHYSIOMOTION_RUNNING_AS_TEST and recommended notebook checks.
tests/EXPERIMENT_FLAG_USAGE.md Documents the new test-mode flag behavior and links to guide.
src/physiomotion4d/test_tools.py New baseline comparison/writer utilities for ITK images/transforms.
src/physiomotion4d/physiomotion4d_base.py Adds warning filters for specific SWIG-related DeprecationWarnings.
src/physiomotion4d/notebook_utils.py Adds running_as_test() helper to detect test-mode in notebooks.
pyproject.toml Enables always-on warnings (-W always) in pytest addopts.
experiments/README.md Documents the test-mode flag for experiment notebooks.
experiments/Reconstruct4DCT/reconstruct_4d_ct_class.ipynb Uses running_as_test() to select quick vs full run parameters.
experiments/Reconstruct4DCT/reconstruct_4d_ct.ipynb Notebook execution metadata updated (timings).
experiments/Lung-GatedCT_To_USD/Experiment_SubSurfaceScatter.ipynb Notebook execution metadata updated (timings).
experiments/Lung-GatedCT_To_USD/Experiment_SegReg.ipynb Notebook execution metadata updated (timings).
experiments/Lung-GatedCT_To_USD/Experiment_CombineModels.ipynb Notebook execution metadata updated (timings).
experiments/Lung-GatedCT_To_USD/Experiment_ArrangeOnStage.ipynb Notebook execution metadata updated (timings).
experiments/Lung-GatedCT_To_USD/2-paint_dirlab_models.ipynb Notebook execution metadata updated (timings).
experiments/Lung-GatedCT_To_USD/1-make_dirlab_models.ipynb Notebook execution metadata updated (timings).
experiments/Lung-GatedCT_To_USD/0-register_dirlab_4dct.ipynb Notebook execution metadata updated (timings).
experiments/Heart-VTKSeries_To_USD/1-heart_vtkseries_to_usd.ipynb Notebook execution metadata updated (timings).
experiments/Heart-VTKSeries_To_USD/0-download_and_convert_4d_to_3d.ipynb Notebook execution metadata updated (timings).
experiments/Heart-Statistical_Model_To_Patient/heart_model_to_patient.ipynb Notebook execution metadata updated (timings).
experiments/Heart-Statistical_Model_To_Patient/heart_model_to_model_registration_pca.ipynb Uses running_as_test() to reduce iterations when executed under pytest.
experiments/Heart-Statistical_Model_To_Patient/heart_model_to_model_icp_itk.ipynb Notebook execution metadata updated (timings) and widget state changes.
experiments/Heart-GatedCT_To_USD/test_vista3d_inMem.ipynb Notebook execution metadata and widget state changes.
experiments/Heart-GatedCT_To_USD/test_vista3d_class.ipynb Notebook execution metadata and widget state changes.
experiments/Heart-GatedCT_To_USD/4-merge_dynamic_and_static_usd.ipynb Notebook execution metadata updated (timings).
experiments/Heart-GatedCT_To_USD/3-transform_dynamic_and_static_contours.ipynb Notebook execution metadata updated (timings).
experiments/Heart-GatedCT_To_USD/2-generate_segmentation.ipynb Notebook execution metadata updated (timings).
experiments/Heart-GatedCT_To_USD/1-register_images.ipynb Notebook execution metadata updated (timings).
experiments/Heart-GatedCT_To_USD/0-download_and_convert_4d_to_3d.ipynb Now contains execution counts and outputs (not cleared).
experiments/Heart-Create_Statistical_Model/4-surfaces_aligned_correspond_to_pca_inputs.ipynb Notebook execution metadata and widget state changes.
experiments/Heart-Create_Statistical_Model/2-input_surfaces_to_surfaces_aligned.ipynb Notebook execution metadata and widget state changes.
experiments/Heart-Create_Statistical_Model/1-input_meshes_to_input_surfaces.ipynb Notebook execution metadata and widget state changes.
experiments/Convert_VTK_To_USD/convert_vtk_to_usd_using_class.ipynb Notebook execution metadata updated (timings).
experiments/Convert_VTK_To_USD/convert_chop_valve_to_usd.ipynb Notebook execution metadata updated (timings).
experiments/Colormap-VTK_To_USD/colormap_vtk_to_usd.ipynb Notebook execution metadata updated (timings).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +171 to 175
moving_image, "basic_time_series_registered_0.mha"
)
test_tools.compare_result_to_baseline_image(
"basic_time_series_registered_0.mha",
)
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The baseline image comparison result is ignored here as well, so the test can pass even if the registered image differs from the baseline. Please assert the boolean return (or raise on failure in TestTools).

Copilot uses AI. Check for mistakes.
Comment on lines +223 to +227
forward_transforms[0], "prior_forward_transform_0.hdf"
)
test_tools.compare_result_to_baseline_transform(
"prior_forward_transform_0.hdf",
)
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The baseline transform comparison returns a boolean but isn't asserted, so regressions won't fail this test. Please assert the return value (or have TestTools raise on failure).

Copilot uses AI. Check for mistakes.
Comment on lines +389 to 393
registered_image, "transform_application_time_series_0.mha"
)
test_tools.compare_result_to_baseline_image(
"transform_application_time_series_0.mha",
)
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The baseline comparison return value is ignored here too, so this test can pass even if the image differs from the baseline. Please assert the return value (or raise on failure).

Copilot uses AI. Check for mistakes.
Comment on lines +202 to +206
if not baseline_path.exists():
shutil.copy(str(results_path), str(baseline_path))
self.log_warning(
"Baseline transform did not exist; copied results transform: %s",
results_path,
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-creating the baseline by copying the current result when the baseline file is missing makes the comparison meaningless on a fresh checkout/CI run (it will always pass). Consider failing when the baseline is missing by default, and only allowing baseline generation behind an explicit flag/env var (e.g. UPDATE_BASELINES=1).

Copilot uses AI. Check for mistakes.
Comment thread src/physiomotion4d/test_tools.py Outdated
Comment on lines +222 to +224
self._last_transform_num_values_above_tol = int(
np.sum(diff_squared > per_value_tol)
)
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff_squared is compared against per_value_tol here, which makes tolerance semantics inconsistent (squared values vs linear tolerance). Either compare abs(diff) > per_value_tol, or rename/document the argument as a squared tolerance.

Copilot uses AI. Check for mistakes.
Comment on lines +77 to +81
"outputs": [
{
"data": {
"text/plain": [
"'./results//slice_fixed.mha'"
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This notebook cell has committed execution output (non-empty outputs). Please clear cell outputs (and any widget state/output) before committing so the notebook remains deterministic and the repo doesn’t grow from embedded outputs.

Copilot uses AI. Check for mistakes.
Comment on lines +164 to +168
forward_transforms[0], "basic_forward_transform_0.hdf"
)
test_tools.compare_result_to_baseline_transform(
"basic_forward_transform_0.hdf",
)
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The baseline transform comparison returns a boolean but the test doesn't assert it, so mismatches will not fail the test (only get logged). Please assert the return value (or have TestTools raise on failure).

Copilot uses AI. Check for mistakes.
Comment on lines +230 to +234
moving_image, "prior_time_series_registered_0.mha"
)
test_tools.compare_result_to_baseline_image(
"prior_time_series_registered_0.mha",
)
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The baseline image comparison return value is ignored, so mismatches won't fail the test. Please assert the return value (or have TestTools raise on failure).

Copilot uses AI. Check for mistakes.
Comment on lines +208 to +212
transform = itk.transformread(str(results_path))
transform_params = np.array(transform[0].GetParameters())

baseline_transform = itk.transformread(str(baseline_path))
baseline_transform_params = np.array(baseline_transform[0].GetParameters())
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transformread() returns a list; this code assumes both reads return at least one transform and that parameter vectors are the same length. Please validate the number of transforms read and that len(parameters) matches before computing the diff, and raise a clear error otherwise.

Copilot uses AI. Check for mistakes.
Comment on lines 4 to +8
"cell_type": "code",
"execution_count": null,
"execution_count": 1,
"metadata": {
"execution": {
"iopub.execute_input": "2026-02-04T02:35:36.101493Z",
"iopub.status.busy": "2026-02-04T02:35:36.100494Z",
"iopub.status.idle": "2026-02-04T02:35:51.037775Z",
"shell.execute_reply": "2026-02-04T02:35:51.036978Z"
"iopub.execute_input": "2026-02-09T04:51:59.431418Z",
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This notebook is committed with non-null execution_count values. The experiment test runner is designed to clear outputs/execution counts to keep the repo clean, so please reset execution_count back to null before committing.

Copilot uses AI. Check for mistakes.
- TestTools: compare_result_to_baseline_transform and compare_result_to_baseline_image
- TestRegisterTimeSeriesImages: baseline .hdf transforms and .mha images
- pytest --create-baselines support; CI and test docs updated
@aylward aylward merged commit a291bbf into main Feb 12, 2026
11 checks passed
@aylward aylward deleted the testing_img_tfm branch February 12, 2026 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants